{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/Iallen520/lhy_DL_Hw/blob/master/hw1_regression.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "p9FfatPz6MU3"
   },
   "source": [
    "# **Homework 1: Linear Regression**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fsoaNwrZA0ui"
   },
   "source": [
    "本次目標：由前 9 個小時的 18 個 features (包含 PM2.5)預測的 10 個小時的 PM2.5。<!-- 可以參考 <link> 獲知更細項的作業說明。-->\n",
    "\n",
    "<!-- 首先，從 https://drive.google.com/open?id=1El0zvTkrSuqCTDcMpijXpADvJzZC2Jpa 將整個資料夾下載下來，並將下載下來的資料夾放到自己的 Google Drive（注意：上傳到自己 Google Drive 的是資料夾 hw1-regression，而非壓縮檔） -->\n",
    "\n",
    "\n",
    "若有任何問題，歡迎來信至助教信箱 ntu-ml-2020spring-ta@googlegroups.com"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U7RiAkkjCc6l"
   },
   "source": [
    "# **Load 'train.csv'**\n",
    "train.csv 的資料為 12 個月中，每個月取 20 天，每天 24 小時的資料(每小時資料有 18 個 features)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "1AfNX-hB3kN8"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "data = pd.read_csv('train.csv', encoding = 'big5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gqUdj00pDTpo"
   },
   "source": [
    "# **Preprocessing** \n",
    "取需要的數值部分，將 'RAINFALL' 欄位全部補 0。\n",
    "另外，如果要在 colab 重覆這段程式碼的執行，請從頭開始執行(把上面的都重新跑一次)，以避免跑出不是自己要的結果（若自己寫程式不會遇到，但 colab 重複跑這段會一直往下取資料。意即第一次取原本資料的第三欄之後的資料，第二次取第一次取的資料掉三欄之後的資料，...）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "AIGP7XUYD_Yb"
   },
   "outputs": [],
   "source": [
    "data = data.iloc[:, 3:]\n",
    "data[data == 'NR'] = 0\n",
    "raw_data = np.array(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(4320, 24)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw_data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "V7PCrVwX6jBF"
   },
   "source": [
    "# **Extract Features (1)**\n",
    "![圖片說明](https://drive.google.com/uc?id=1LyaqD4ojX07oe5oDzPO99l9ts5NRyArH)\n",
    "![圖片說明](https://drive.google.com/uc?id=1ZroBarcnlsr85gibeqEF-MtY13xJTG47)\n",
    "\n",
    "將原始 4320 * 18 的資料依照每個月分重組成 12 個 18 (features) * 480 (hours) 的資料。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "HBnrGYXu9dZQ"
   },
   "outputs": [],
   "source": [
    "month_data = {}\n",
    "for month in range(12):\n",
    "    sample = np.empty([18, 480])\n",
    "    for day in range(20):\n",
    "        sample[:, day * 24 : (day + 1) * 24] = \\\n",
    "            raw_data[18 * (20 * month + day) : 18 * (20 * month + day + 1), :]\n",
    "    month_data[month] = sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18, 480)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "month_data[1].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "WhVmtFEQ9D6t"
   },
   "source": [
    "# **Extract Features (2)**\n",
    "![alt text](https://drive.google.com/uc?id=1wKoPuaRHoX682LMiBgIoOP4PDyNKsJLK)\n",
    "![alt text](https://drive.google.com/uc?id=1FRWWiXQ-Qh0i9tyx0LiugHYF_xDdkhLN)\n",
    "\n",
    "每個月會有 480hrs，每 9 小時形成一個 data，每個月會有 471 個 data，故總資料數為 471 * 12 筆，而每筆 data 有 9 * 18 的 features (一小時 18 個 features * 9 小時)。\n",
    "\n",
    "對應的 target 則有 471 * 12 個(第 10 個小時的 PM2.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18, 9)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "month_data[0][:,0 * 24 + 0 : 0 * 24 + 0 + 9].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "dcOrC4Fi-n3i"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[14.  14.  14.  ...  2.   2.   0.5]\n",
      " [14.  14.  13.  ...  2.   0.5  0.3]\n",
      " [14.  13.  12.  ...  0.5  0.3  0.8]\n",
      " ...\n",
      " [17.  18.  19.  ...  1.1  1.4  1.3]\n",
      " [18.  19.  18.  ...  1.4  1.3  1.6]\n",
      " [19.  18.  17.  ...  1.3  1.6  1.8]]\n",
      "[[30.]\n",
      " [41.]\n",
      " [44.]\n",
      " ...\n",
      " [17.]\n",
      " [24.]\n",
      " [29.]]\n"
     ]
    }
   ],
   "source": [
    "x = np.empty([12 * 471, 18 * 9], dtype = float)\n",
    "y = np.empty([12 * 471, 1], dtype = float)\n",
    "for month in range(12):\n",
    "    for day in range(20):\n",
    "        for hour in range(24):\n",
    "            if day == 19 and hour > 14:\n",
    "                continue\n",
    "            x[month * 471 + day * 24 + hour, :] = \\\n",
    "                month_data[month][:,day * 24 + hour : day * 24 + hour + 9].reshape(1, -1) #vector dim:18*9 (9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9)\n",
    "            y[month * 471 + day * 24 + hour, 0] = \\\n",
    "                month_data[month][9, day * 24 + hour + 9] #value\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1wOii0TX8IwE"
   },
   "source": [
    "# **Normalize (1)**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ceMqFoNI8ftQ"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-1.35825331, -1.35883937, -1.359222  , ...,  0.26650729,\n",
       "         0.2656797 , -1.14082131],\n",
       "       [-1.35825331, -1.35883937, -1.51819928, ...,  0.26650729,\n",
       "        -1.13963133, -1.32832904],\n",
       "       [-1.35825331, -1.51789368, -1.67717656, ..., -1.13923451,\n",
       "        -1.32700613, -0.85955971],\n",
       "       ...,\n",
       "       [-0.88092053, -0.72262212, -0.56433559, ..., -0.57693779,\n",
       "        -0.29644471, -0.39079039],\n",
       "       [-0.7218096 , -0.56356781, -0.72331287, ..., -0.29578943,\n",
       "        -0.39013211, -0.1095288 ],\n",
       "       [-0.56269867, -0.72262212, -0.88229015, ..., -0.38950555,\n",
       "        -0.10906991,  0.07797893]])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_x = np.mean(x, axis = 0) #18 * 9 \n",
    "std_x = np.std(x, axis = 0) #18 * 9 \n",
    "for i in range(len(x)): #12 * 471\n",
    "    for j in range(len(x[0])): #18 * 9 \n",
    "        if std_x[j] != 0:\n",
    "            x[i][j] = (x[i][j] - mean_x[j]) / std_x[j]\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "NzvXP5Jya64j"
   },
   "source": [
    "#**Split Training Data Into \"train_set\" and \"validation_set\"**\n",
    "這部分是針對作業中 report 的第二題、第三題做的簡單示範，以生成比較中用來訓練的 train_set 和不會被放入訓練、只是用來驗證的 validation_set。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "feF4XXOQb5SC"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4521\n",
      "4521\n",
      "1131\n",
      "1131\n"
     ]
    }
   ],
   "source": [
    "import math\n",
    "x_train_set = x[: math.floor(len(x) * 0.8), :]\n",
    "y_train_set = y[: math.floor(len(y) * 0.8), :]\n",
    "x_validation = x[math.floor(len(x) * 0.8): , :]\n",
    "y_validation = y[math.floor(len(y) * 0.8): , :]\n",
    "print(len(x_train_set))\n",
    "print(len(y_train_set))\n",
    "print(len(x_validation))\n",
    "print(len(y_validation))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Q-qAu0KR_ZRR"
   },
   "source": [
    "# **Training**\n",
    "![alt text](https://drive.google.com/uc?id=1xIXvqZ4EGgmxrp7c9r0LOVbcvd4d9H4N)\n",
    "![alt text](https://drive.google.com/uc?id=1S42g06ON5oJlV2f9RukxawjbE4NpsaB6)\n",
    "![alt text](https://drive.google.com/uc?id=1BbXu-oPB9EZBHDQ12YCkYqtyAIil3bGj)\n",
    "\n",
    "(和上圖不同處: 下面的 code 採用 Root Mean Square Error)\n",
    "\n",
    "因為常數項的存在，所以 dimension (dim) 需要多加一欄；eps 項是避免 adagrad 的分母為 0 而加的極小數值。\n",
    "\n",
    "每一個 dimension (dim) 會對應到各自的 gradient, weight (w)，透過一次次的 iteration (iter_time) 學習。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "cCzDfxBFBFqp"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0:27.071214829194115\n",
      "100:33.78905859777454\n",
      "200:19.913751298197095\n",
      "300:13.53106819368969\n",
      "400:10.64546615844617\n",
      "500:9.277353455475062\n",
      "600:8.518042045956497\n",
      "700:8.01406198758842\n",
      "800:7.63675682477569\n",
      "900:7.336563740371123\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(163, 1)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dim = 18 * 9 + 1\n",
    "w = np.zeros([dim, 1])\n",
    "x = np.concatenate((np.ones([12 * 471, 1]), x), axis = 1).astype(float)\n",
    "learning_rate = 100\n",
    "iter_time = 1000\n",
    "adagrad = np.zeros([dim, 1])\n",
    "eps = 0.0000000001\n",
    "for t in range(iter_time):\n",
    "    loss = np.sqrt(np.sum(np.power(np.dot(x, w) - y, 2))/471/12)#rmse\n",
    "    if t % 100 == 0:\n",
    "        print(str(t) + \":\" + str(loss))\n",
    "    gradient = 2 * np.dot(x.transpose(), np.dot(x, w) - y) #dim*1\n",
    "    adagrad += gradient ** 2\n",
    "    w -= learning_rate * gradient / np.sqrt(adagrad + eps)\n",
    "np.save('weight.npy', w)\n",
    "w.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ZqNdWKsYBK28"
   },
   "source": [
    "# **Testing**\n",
    "![alt text](https://drive.google.com/uc?id=1165ETzZyE6HStqKvgR0gKrJwgFLK6-CW)\n",
    "\n",
    "載入 test data，並且以相似於訓練資料預先處理和特徵萃取的方式處理，使 test data 形成 240 個維度為 18 * 9 + 1 的資料。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "AALygqJFCWOA"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  This is separate from the ipykernel package so we can avoid doing imports until\n",
      "C:\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py:3163: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  self._where(-key, value, inplace=True)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(240, 163)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "testdata = pd.read_csv('test.csv', header = None, encoding = 'big5')\n",
    "test_data = testdata.iloc[:, 2:]\n",
    "test_data[test_data == 'NR'] = 0\n",
    "test_data = np.array(test_data)\n",
    "test_x = np.empty([240, 18*9], dtype = float)\n",
    "for i in range(240):\n",
    "    test_x[i, :] = test_data[18 * i: 18 * (i + 1), :].reshape(1, -1)\n",
    "for i in range(len(test_x)):\n",
    "    for j in range(len(test_x[0])):\n",
    "        if std_x[j] != 0:\n",
    "            test_x[i][j] = (test_x[i][j] - mean_x[j]) / std_x[j]\n",
    "test_x = np.concatenate((np.ones([240, 1]), test_x), axis = 1).astype(float)\n",
    "test_x.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dJQks9JEHR6W"
   },
   "source": [
    "# **Prediction**\n",
    "說明圖同上\n",
    "\n",
    "![alt text](https://drive.google.com/uc?id=1165ETzZyE6HStqKvgR0gKrJwgFLK6-CW)\n",
    "\n",
    "有了 weight 和測試資料即可預測 target。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "jNyB229jHsEQ"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  5.1749604 ],\n",
       "       [ 18.30621425],\n",
       "       [ 20.49121809],\n",
       "       [ 11.52394287],\n",
       "       [ 26.61605675],\n",
       "       [ 20.53134808],\n",
       "       [ 21.90655102],\n",
       "       [ 31.73646875],\n",
       "       [ 13.39167406],\n",
       "       [ 64.4564665 ],\n",
       "       [ 20.26456884],\n",
       "       [ 15.35857608],\n",
       "       [ 68.58947277],\n",
       "       [ 48.42811375],\n",
       "       [ 18.70233382],\n",
       "       [ 10.18859574],\n",
       "       [ 30.74036286],\n",
       "       [ 71.13221776],\n",
       "       [ -4.13051739],\n",
       "       [ 18.23569402],\n",
       "       [ 38.57892228],\n",
       "       [ 71.31151973],\n",
       "       [  7.41034816],\n",
       "       [ 18.71795533],\n",
       "       [ 14.93725026],\n",
       "       [ 36.71973669],\n",
       "       [ 17.96169701],\n",
       "       [ 75.78946287],\n",
       "       [ 12.30931025],\n",
       "       [ 56.29535174],\n",
       "       [ 25.11316087],\n",
       "       [  4.61024867],\n",
       "       [  2.48377055],\n",
       "       [ 24.75942226],\n",
       "       [ 30.48028047],\n",
       "       [ 38.46393075],\n",
       "       [ 44.20231061],\n",
       "       [ 30.08683602],\n",
       "       [ 40.47367502],\n",
       "       [ 29.2264799 ],\n",
       "       [  5.60645605],\n",
       "       [ 38.66601608],\n",
       "       [ 34.61021343],\n",
       "       [ 48.38969751],\n",
       "       [ 14.75724767],\n",
       "       [ 34.46682011],\n",
       "       [ 27.48310687],\n",
       "       [ 12.00087938],\n",
       "       [ 21.37803615],\n",
       "       [ 28.54440309],\n",
       "       [ 20.16551382],\n",
       "       [ 10.79667815],\n",
       "       [ 22.17103576],\n",
       "       [ 53.44626311],\n",
       "       [ 12.21958112],\n",
       "       [ 43.30096846],\n",
       "       [ 32.1823351 ],\n",
       "       [ 22.56721751],\n",
       "       [ 56.73951417],\n",
       "       [ 20.74505295],\n",
       "       [ 15.02885456],\n",
       "       [ 39.85530159],\n",
       "       [ 12.97534068],\n",
       "       [ 51.74165959],\n",
       "       [ 18.78336963],\n",
       "       [ 12.34875284],\n",
       "       [ 15.63362365],\n",
       "       [ -0.05887147],\n",
       "       [ 41.50801107],\n",
       "       [ 31.54874753],\n",
       "       [ 18.60425116],\n",
       "       [ 37.47681972],\n",
       "       [ 56.52039066],\n",
       "       [  6.58787719],\n",
       "       [ 12.22933974],\n",
       "       [  5.2036964 ],\n",
       "       [ 47.9273751 ],\n",
       "       [ 13.02070569],\n",
       "       [ 17.11030169],\n",
       "       [ 20.60323453],\n",
       "       [ 21.28448156],\n",
       "       [ 38.69293529],\n",
       "       [ 30.02071668],\n",
       "       [ 88.76740667],\n",
       "       [ 35.98470024],\n",
       "       [ 26.75691355],\n",
       "       [ 23.96351684],\n",
       "       [ 32.74724283],\n",
       "       [ 22.18904376],\n",
       "       [ 20.99215885],\n",
       "       [ 29.55599432],\n",
       "       [ 40.99216887],\n",
       "       [  8.62511781],\n",
       "       [ 32.32147181],\n",
       "       [ 46.59804437],\n",
       "       [ 22.88407083],\n",
       "       [ 31.51812973],\n",
       "       [ 11.19823348],\n",
       "       [ 28.52743664],\n",
       "       [  0.29115068],\n",
       "       [ 17.96696108],\n",
       "       [ 27.12416393],\n",
       "       [ 11.39823278],\n",
       "       [ 16.42642687],\n",
       "       [ 23.42526105],\n",
       "       [ 40.61608267],\n",
       "       [ 25.86412503],\n",
       "       [  5.42273695],\n",
       "       [ 10.79492112],\n",
       "       [ 72.86213693],\n",
       "       [ 48.02283706],\n",
       "       [ 15.74680828],\n",
       "       [ 24.67041061],\n",
       "       [ 12.82779333],\n",
       "       [ 10.15805757],\n",
       "       [ 27.26922334],\n",
       "       [ 29.20873858],\n",
       "       [  8.83533962],\n",
       "       [ 20.05108814],\n",
       "       [ 20.21233374],\n",
       "       [ 79.9060093 ],\n",
       "       [ 18.06161429],\n",
       "       [ 30.54280934],\n",
       "       [ 25.98079238],\n",
       "       [  5.21257727],\n",
       "       [ 30.35569731],\n",
       "       [  7.76832289],\n",
       "       [ 15.32826826],\n",
       "       [ 22.66636572],\n",
       "       [ 62.74205421],\n",
       "       [ 18.95078037],\n",
       "       [ 19.07635563],\n",
       "       [ 61.37157409],\n",
       "       [ 15.88456205],\n",
       "       [ 13.40941808],\n",
       "       [  0.84877248],\n",
       "       [  7.83499672],\n",
       "       [ 57.01282901],\n",
       "       [ 25.60799675],\n",
       "       [  4.96170473],\n",
       "       [ 36.41487904],\n",
       "       [ 28.79000672],\n",
       "       [ 49.19412096],\n",
       "       [ 40.30686986],\n",
       "       [ 13.31618059],\n",
       "       [ 27.66101188],\n",
       "       [ 17.15802752],\n",
       "       [ 49.68726257],\n",
       "       [ 23.03027229],\n",
       "       [ 39.24093652],\n",
       "       [ 13.19675389],\n",
       "       [  5.9488937 ],\n",
       "       [ 25.82160898],\n",
       "       [  8.25863421],\n",
       "       [ 19.14632052],\n",
       "       [ 43.18248653],\n",
       "       [  6.71784358],\n",
       "       [ 33.86961525],\n",
       "       [ 15.36993785],\n",
       "       [ 16.93904497],\n",
       "       [ 37.88533679],\n",
       "       [ 19.20248454],\n",
       "       [  9.05950472],\n",
       "       [ 10.28339961],\n",
       "       [ 48.67244713],\n",
       "       [ 30.58771621],\n",
       "       [  2.4774099 ],\n",
       "       [ 12.81160394],\n",
       "       [ 70.32478981],\n",
       "       [ 14.84096769],\n",
       "       [ 68.86558757],\n",
       "       [ 42.74199244],\n",
       "       [ 24.00026154],\n",
       "       [ 23.42072486],\n",
       "       [ 61.67212444],\n",
       "       [ 25.49420285],\n",
       "       [ 19.00480979],\n",
       "       [ 34.88668288],\n",
       "       [  9.4023134 ],\n",
       "       [ 29.52001131],\n",
       "       [ 14.57396589],\n",
       "       [  9.12556314],\n",
       "       [ 52.812584  ],\n",
       "       [ 45.03953799],\n",
       "       [ 17.45243468],\n",
       "       [ 38.49393528],\n",
       "       [ 27.03891909],\n",
       "       [ 65.58170967],\n",
       "       [  7.03730638],\n",
       "       [ 52.71447713],\n",
       "       [ 38.20645934],\n",
       "       [ 21.16980106],\n",
       "       [ 30.24755688],\n",
       "       [  2.71442299],\n",
       "       [ 19.93293259],\n",
       "       [ -3.41333234],\n",
       "       [ 32.4459994 ],\n",
       "       [ 10.58297303],\n",
       "       [ 21.77522571],\n",
       "       [ 62.46529207],\n",
       "       [ 24.13294369],\n",
       "       [ 26.20123965],\n",
       "       [ 63.74447723],\n",
       "       [  2.83429777],\n",
       "       [ 14.37924699],\n",
       "       [  9.36985073],\n",
       "       [  9.88116661],\n",
       "       [  3.49494536],\n",
       "       [122.60804938],\n",
       "       [ 21.08351301],\n",
       "       [ 17.5322206 ],\n",
       "       [ 20.18309834],\n",
       "       [ 36.39313221],\n",
       "       [ 34.93515121],\n",
       "       [ 18.83031266],\n",
       "       [ 38.34455552],\n",
       "       [ 77.91663414],\n",
       "       [  1.79532355],\n",
       "       [ 13.44582794],\n",
       "       [ 36.13115559],\n",
       "       [ 15.1504035 ],\n",
       "       [ 12.94184833],\n",
       "       [113.12524094],\n",
       "       [ 15.22460468],\n",
       "       [ 14.82402597],\n",
       "       [ 59.26735369],\n",
       "       [ 10.58369529],\n",
       "       [ 20.99306256],\n",
       "       [  9.78936588],\n",
       "       [  4.77118001],\n",
       "       [ 47.9278069 ],\n",
       "       [ 12.39943839],\n",
       "       [ 48.14647656],\n",
       "       [ 40.4663804 ],\n",
       "       [ 16.94059027],\n",
       "       [ 41.26654449],\n",
       "       [ 69.02789203],\n",
       "       [ 40.34624924],\n",
       "       [ 14.31374398],\n",
       "       [ 15.77072663]])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.set_printoptions(suppress=True)\n",
    "\n",
    "w = np.load('weight.npy')\n",
    "ans_y = np.dot(test_x, w)\n",
    "ans_y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HKMKW7RzHwuO"
   },
   "source": [
    "# **Save Prediction to CSV File**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Dwfpqqy0H8en"
   },
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "with open('submit.csv', mode='w', newline='') as submit_file:\n",
    "    csv_writer = csv.writer(submit_file)\n",
    "    header = ['id', 'value']\n",
    "    csv_writer.writerow(header)\n",
    "    for i in range(240):\n",
    "        row = ['id_' + str(i), ans_y[i][0]]\n",
    "        csv_writer.writerow(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Y54yWq9cIPR4"
   },
   "source": [
    "相關 reference 可以參考:\n",
    "\n",
    "Adagrad :\n",
    "https://youtu.be/yKKNr-QKz2Q?list=PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49&t=705 \n",
    "\n",
    "RMSprop : \n",
    "https://www.youtube.com/watch?v=5Yt-obwvMHI \n",
    "\n",
    "Adam\n",
    "https://www.youtube.com/watch?v=JXQT_vxqwIs \n",
    "\n",
    "\n",
    "以上 print 的部分主要是為了看一下資料和結果的呈現，拿掉也無妨。另外，在自己的 linux 系統，可以將檔案寫死的的部分換成 sys.argv 的使用 (可在 terminal 自行輸入檔案和檔案位置)。\n",
    "\n",
    "最後，可以藉由調整 learning rate、iter_time (iteration 次數)、取用 features 的多寡(取幾個小時，取哪些特徵欄位)，甚至是不同的 model 來超越 baseline。\n",
    "\n",
    "Report 的問題模板請參照 : https://docs.google.com/document/d/1s84RXs2AEgZr54WCK9IgZrfTF-6B1td-AlKR9oqYa4g/edit"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "hw1_regression.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
